The human genome contracts again
نویسندگان
چکیده
UNLABELLED The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore, there is a growing need to compress these data efficiently. Here, we describe a technique to compress human genomes based on entropy coding, using a reference genome and known Single Nucleotide Polymorphisms (SNPs). Furthermore, we explore several intrinsic features of genomes and information in other genomic databases to further improve the compression attained. Using these methods, we compress James Watson's genome to 2.5 megabytes (MB), improving on recent work by 37%. Similar compression is obtained for most genomes available from the 1000 Genomes Project. Our biologically inspired techniques promise even greater gains for genomes of lower organisms and for human genomes as more genomic data become available. AVAILABILITY Code is available at sourceforge.net/projects/genomezip/
منابع مشابه
O-14: General Governing Rules of ART Contracts Involving Third Parties
Background: ART contracts involving third parties have been created while clinical reproductive treatments are globally widespread. Iran is pioneer in applying these treatments in middle-east due to shii’at jurisprudence prescribing them. This key role in region, has raised Iranian jurists’ responsibility in developing a legal system regarding administration of ART. The most significant part of...
متن کاملO-38: Concurrent Whole-Genome Haplotyping and Copy-Number Profiling of Single Cells
Background Methods for haplotyping and DNA copynumber typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a conseque...
متن کاملI-44: Concurrent Whole-Genome Haplotyping and Copy-Number Profiling of Single Cells
Background Methods for haplotyping and DNA copynumber typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a conseque...
متن کاملMicroRNAs as Immune Regulators of Inflammation in Children with Epilepsy
Epilepsy is a chronic clinical syndrome of brain function which is caused by abnormal discharge of neurons. MicroRNAs (MiRNAs) are small noncoding RNAs which act post transcriptionally to regulate negatively protein levels. They affect neuroinflammatory signaling, glial and neuronal structure and function, neurogenesis, cell death, and other processes linked to epileptogenesis. The aim of this ...
متن کاملI-38: Chromosome Instability in The Cleavage Stage Embryo
Recently, we demonstrated chromosome instability (CIN) in human cleavage stage embryogenesis following in vitro fertilization (IVF). CIN not necessarily undermines normal human development (i.e. when remaining normal diploid blastomeres develop the embryo proper), however it can spark a spectrum of conditions, including loss of conception, genetic disease and genetic variation development. To s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 29 17 شماره
صفحات -
تاریخ انتشار 2013